369 research outputs found
How to train accurate BNNs for embedded systems?
A key enabler of deploying convolutional neural networks on
resource-constrained embedded systems is the binary neural network (BNN). BNNs
save on memory and simplify computation by binarizing both features and
weights. Unfortunately, binarization is inevitably accompanied by a severe
decrease in accuracy. To reduce the accuracy gap between binary and
full-precision networks, many repair methods have been proposed in the recent
past, which we have classified and put into a single overview in this chapter.
The repair methods are divided into two main branches, training techniques and
network topology changes, which can further be split into smaller categories.
The latter category introduces additional cost (energy consumption or
additional area) for an embedded system, while the former does not. From our
overview, we observe that progress has been made in reducing the accuracy gap,
but BNN papers are not aligned on what repair methods should be used to get
highly accurate BNNs. Therefore, this chapter contains an empirical review that
evaluates the benefits of many repair methods in isolation over the
ResNet-20\&CIFAR10 and ResNet-18\&CIFAR100 benchmarks. We found three repair
categories most beneficial: feature binarizer, feature normalization, and
double residual. Based on this review we discuss future directions and research
opportunities. We sketch the benefit and costs associated with BNNs on embedded
systems because it remains to be seen whether BNNs will be able to close the
accuracy gap while staying highly energy-efficient on resource-constrained
embedded systems
Memory and Parallelism Analysis Using a Platform-Independent Approach
Emerging computing architectures such as near-memory computing (NMC) promise
improved performance for applications by reducing the data movement between CPU
and memory. However, detecting such applications is not a trivial task. In this
ongoing work, we extend the state-of-the-art platform-independent software
analysis tool with NMC related metrics such as memory entropy, spatial
locality, data-level, and basic-block-level parallelism. These metrics help to
identify the applications more suitable for NMC architectures.Comment: 22nd ACM International Workshop on Software and Compilers for
Embedded Systems (SCOPES '19), May 201
The State of Utah v. NINE THOUSAND ONE HUNDRED AND NINETY NINE DOLLARS, UNITED STATES CURRENCY, ONE PAGER, SERIAL NO. 0701843, AND ONE 4-INCH SMITH AND WESSON .44 MAGNUM GUN, MODEL 29 : Brief of Appellee
APPEAL FROM THE THIRD JUDICIAL DISTRICT COURT, IN AND FOR SALT LAKE COUNTY, STATE OF UTA
PET-to-MLIR:A polyhedral front-end for MLIR
We present PET-to-MLIR, a new tool to enter the MLIR compiler framework from C source. The tool is based on the popular PET and ISL libraries for extracting and manipulating quasi-affine sets and relations, and Loop Tactics, a declarative optimizer. The use of PET brings advanced diagnosis and full support for C by relying on the Clang parser. ISL allows easy manipulation of the polyhedral representation and efficient code generation. Loop Tactics, on the other hand, enable us to detect computational motifs transparently and lift the entry point in MLIR, thus enabling domain-specific optimizations in general-purpose code.We demonstrate our tool using the Polybench/C benchmark suite and show that it can lower most of the benchmarks to the MLIR’s affine dialect successfully. We believe that our tool can benefit research in the compiler community by providing an automatic way to translate C code to the MLIR affine dialect
THOR:A Neuromorphic Processor with 7.29G TSOP2/mm2Js Energy-Throughput Efficiency
Neuromorphic computing using biologically inspired Spiking Neural Networks (SNNs) is a promising solution to meet Energy-Throughput (ET) efficiency needed for edge computing devices. Neuromorphic hardware architectures that emulate SNNs in analog/mixed-signal domains have been proposed to achieve order-of-magnitude higher energy efficiency than all-digital architectures, however at the expense of limited scalability, susceptibility to noise, complex verification, and poor flexibility. On the other hand, state-of-the-art digital neuromorphic architectures focus either on achieving high energy efficiency (Joules/synaptic operation (SOP)) or throughput efficiency (SOPs/second/area), resulting in poor ET efficiency. In this work, we present THOR, an all-digital neuromorphic processor with a novel memory hierarchy and neuron update architecture that addresses both energy consumption and throughput bottlenecks. We implemented THOR in 28nm FDSOI CMOS technology and our post-layout results demonstrate an ET efficiency of 7.29G TSOP2/mm2Js at 0.9V, 400 MHz, which represents a 3X improvement over state-of-the-art digital neuromorphic processors
- …